36 ◾ Bioinformatics
directory “preprocessing”, then download the FASTQ file from the NCBI SRA database,
and finally rename it to “bad.fastq” file for the practice purpose. The script then generates
the QC FastQC report and displays the report on the Firefox browser.
mkdir preprocessing
cd preprocessing
fasterq-dump --verbose SRR957824
rm SRR957824_1.fastq
mv SRR957824_2.fastq bad.fastq
fqfile=$(ls *.fastq)
fastqc $fqfile
htmlfile=$(ls *.html)
firefox $htmlfile
When all the commands have been executed sequentially without an error, the QC report
will be displayed on the Firefox Internet browser. Study the reports carefully and identify
any potential problems on the quality metrics that we have discussed in the previous sec-
tion. Figure 1.29 shows that the reads in the file have three failures and a single warning.
Next, we will try to fix these problems as possible.
Using such FASTQ file in the downstream analysis without fixing some of the quality
problems will definitely impact the results negatively and may lead to misleading results.
The good strategy whenever there are warnings or failures is to try the available ways to fix
the problems as possible, and if there is any unfixable problem, you may need to be aware
of it and to know how it may affect the results.
FIGURE 1.29 The QC report summary and per base sequence quality for “bad.fastq” file.